32 research outputs found
Exploiting Monotone Convergence Functions in Parallel Programs
Scientific codes which use iterative methods are often difficult to
parallelize well. Such codes usually contain \code{while} loops which
iterate until they converge upon the solution. Problems arise since
the number of iterations cannot be determined at compile time, and
tests for termination usually require a global reduction and an
associated barrier. We present a method which allows us avoid
performing global barriers and exploit pipelined parallelism when
processors can detect non-convergence from local information.
(Also cross-referenced as UMIACS-TR-96-31.1
Parallelizing Julia with a Non-Invasive DSL (Artifact)
This artifact is based on ParallelAccelerator, an embedded domain-specific language (DSL) and compiler for speeding up compute-intensive Julia programs. In particular, Julia code that makes heavy use of aggregate array operations is a good candidate for speeding up with ParallelAccelerator. ParallelAccelerator is a non-invasive DSL that makes as few changes to the host programming model as possible
Parallelizing Julia with a Non-Invasive DSL
Computational scientists often prototype software using productivity
languages that offer high-level programming abstractions. When higher
performance is needed, they are obliged to rewrite their code in a
lower-level efficiency language. Different solutions have been
proposed to address this trade-off between productivity and
efficiency. One promising approach is to create embedded
domain-specific languages that sacrifice generality for productivity
and performance, but practical experience with DSLs points to some
road blocks preventing widespread adoption. This paper proposes a
non-invasive domain-specific language that makes as few visible
changes to the host programming model as possible. We present ParallelAccelerator,
a library and compiler for high-level, high-performance scientific
computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing
Julia programming idioms. Our compiler exposes the implicit
parallelism in high-level array-style programs and compiles them to
fast, parallel native code. Programs can also run in "library-only"
mode, letting users benefit from the full Julia environment and
libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary
Transitive Closure of Infinite Graphs and its Applications
Integer tuple relations can concisely summarize many types of
information gathered from analysis of scientific codes. For example
they can be used to precisely describe which iterations of a statement
are data dependent of which other iterations. It is generally not
possible to represent these tuple relations by enumerating the related
pairs of tuples. For example, it is impossible to enumerate the
related pairs of tuples in the relation {[i] -> [i+2] | 1 <= i <=
n-2}. Even when it is possible to enumerate the related pairs of
tuples, such as for the relation {[i,j] -> [i',j'] | 1 <= i,j,i',j' <=
100}, it is often not practical to do so. We instead use a closed form
description by specifying a predicate consisting of affine constraints
on the related pairs of tuples. As we just saw, these affine
constraints can be parameterized, so what we are really describing are
infinite families of relations (or graphs). Many of our applications
of tuple relations rely heavily on an operation called transitive
closure. Computing the transitive closure of these "infinite graphs"
is very different from the traditional problem of computing the
transitive closure of a graph whose edges can be enumerated. For
example, the transitive closure of the first relation above is the
relation {[i] -> [i'] | exists beta s.t. i'-i = 2beta and 1 <= i <= i'
<= n}. As we will prove, this computation is not computable in the
general case. We have developed algorithms that produce exact results
in most commonly occurring cases and produce upper or lower bounds (as
necessary) in the other cases. This paper will describe our algorithms
for computing transitive closure and some of its applications such as
determining which inter-processor synchronizations are redundant.
(Also cross-referenced as UMIACS-TR-95-48
Compiler Support for Sparse Tensor Computations in MLIR
Sparse tensors arise in problems in science, engineering, machine learning,
and data analytics. Programs that operate on such tensors can exploit sparsity
to reduce storage requirements and computational time. Developing and
maintaining sparse software by hand, however, is a complex and error-prone
task. Therefore, we propose treating sparsity as a property of tensors, not a
tedious implementation task, and letting a sparse compiler generate sparse code
automatically from a sparsity-agnostic definition of the computation. This
paper discusses integrating this idea into MLIR
Generating Efficient Stack Code for Java
Optimizing Java byte code is complicated by the fact that it uses a stack-based execution model. Changing the intermediate representation from the stack-based to the register-based one brings the problem of Java byte code optimizations into well-studied domain of compiler optimizations for registerbased codes. In this paper we describe the technique to convert a register-based code into the Java byte code. The code generation techniques developed for the stack-based computers are not directly applicable to this problem as the comparative cost of the local memory and stack manipulation instructions in JVM is quite different from that in the stack-based computers. Naive verbose translation of the registerbased code into the Java byte code produces the code with many redundant store and load instructions. The tool that we have developed allows to remove 90-100 % of the stores to the local (i.e., non-global) variables. It produces the Java byte code that is slightly faster and shorter than..
On Fast Array Data Dependence Tests
Array data-dependence analysis is an important part of any optimizing compiler for scientific programs. The Omega test is an exact test for integer solutions to affine constraints and can be used for array data dependence. There are other tests that are less exact but are intended to be faster. Many of these less exact tests are rather complicated and designed to be as accurate as possible while still being fast. In this paper, we describe the Epsilon test, intended to be as simple and as fast as possible, while not being embarrassingly inaccurate. We explore the relative speed and accuracy of the Epsilon and Omega test, and discuss how they might be joined. We also point out serious errors in recent published work on array data dependence tests. 1 Introduction Array data dependence analysis is an important part of any optimizing compiler for scientific programs. Consider the following code fragment: for i = 1 to n do 1: a[i] := ... 2: ... := a[i-1] At each iteration the read reaches..